Inter-line Distance Estimation and Text Line Extraction for Unconstrained Online Handwriting

نویسنده

  • EUGENE H. RATZLAFF
چکیده

Methods for detecting and extracting whole text lines from unconstrained online handwritten text are described. The general approach is a “bottom-up” clustering of discrete strokes into small groups that are then merged into isolated lines of text. Initial clustering of strokes into groups is based on combined temporal and spatial stroke proximity. Spatial stroke proximity is gauged relative to estimated inter-line distance and mean character height. Two methods applicable to off-line or on-line data are described for estimating the inter-line distance: autocorrelation of the Y-axis projection histogram, and a fitting function. Inter-line distance is accurately determined for 99% of all text pages. Text line extraction accuracy on letters (correspondence) is 98.7% and on tables is 94.9%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hierarchical Decomposition of Handwritten Manuscripts Layouts

In this paper we propose a new approach to improve electronic editions of literary corpus, providing an efficient estimation of manuscripts pages structure. In any handwriting documents analysis process, structure recognition is an important issue. The presence of variable inter-line spaces, of inconstant base-line skews, overlappings and occlusions in unconstrained ancient 19th handwritten doc...

متن کامل

Experiments in Unconstrained Offline Handwritten Text Recognition

A system for off-line handwritten text recognition is presented. It is characterized by a segmentation-free approach, i.e. whole lines of text are processed by the recognition module. The methods used for pre-processing, feature extraction, and statistical modelling are described, and several experiments on writer-independent, multiple writer, and single writer handwriting recognition tasks are...

متن کامل

Using an artificial neural network approach for off-line sentence segmentation

This paper works with an Artificial Neural Network (ANN) architecture to segment unconstrained English handwriting sentences into single words. The ANN receives a feature set of the handwritten text line and classifies each image’s column belonging to a word or a gap between words. As result, the sequences of columns with the same classification represent the segmented words or inter-word gaps....

متن کامل

Text Line Segmentation and Word Recognition in a System for General Writer Independent Handwriting Recognition

In this paper we present a system for recognizing unconstrained English handwritten text based on a large vocabulary. We describe the three main components of the system, which are preprocessing, feature extraction and recognition. In the preprocessing phase the handwritten texts are first segmented into lines. Then each line of text is normalized with respect to of skew, slant, vertical positi...

متن کامل

Microsoft Word - CONTENTS-AUGUST07

The last two decades witnessed some advances in the development of an Arabic character recognition (CR) system. Arabic CR faces technical problems not encountered in any other language that make Arabic CR systems achieve relatively low accuracy and retards establishing them as market products. We propose the basic stages towards a system that attacks the problem of recognizing online Arabic cur...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000